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1  Statement  of  the  problem  studied 

Traditional  intrusion  detection  systems  (IDSs)  focus  on  low-level  attacks  or  anomalies,  and  raise  alerts 
independently,  though  there  may  be  logical  connections  between  them.  In  situations  where  there  are  intensive 
attacks,  not  only  will  actual  alerts  be  mixed  with  false  alerts,  but  the  amount  of  alerts  will  also  become 
unmanageable.  As  a  result,  it  is  difficult  for  human  users  or  intrusion  response  systems  to  understand  the  alerts 
and  take  appropriate  actions. 

The  objective  of  this  project  is  to  develop  techniques  and  tools  to  facilitate  the  automatic  (or  semi-automatic) 
analysis  of  IDS  alerts.  In  particular,  we  have  thoroughly  investigated  the  following  issues:  (1)  construction  of 
attack  scenarios  from  IDS  alerts,  (2)  efficient  and  effective  analysis  of  large  sets  of  IDS  alerts,  (3)  profiling  and 
learning  of  attack  strategies  from  correlated  alerts,  (4)  hypothesizing  and  reasoning  about  attacks  missed  by 
IDSs,  (5)  integration  of  intrusion  evidence  from  IDSs  and  other  complementary  information  sources,  (6)  alert 
correlation  when  there  are  privacy  concerns,  (7)  systematic  development  of  the  knowledge  base  required  for 


alert  correlation  in  our  approach,  and  (8)  vulnerability  analysis  of  MANET  routing  protocols  to  facilitate  the 
application  of  alert  correlation  in  MANET  applications.  We  have  made  significant  progress  in  this  project  on  all 
these  issues,  as  described  next. 


2  Summary  of  the  most  import  results 

Under  the  support  of  this  grant,  we  have  obtained  a  number  of  important  results,  which  are  summarized 
below: 

2.1  A  formal  framework  for  correlating  intrusion  alerts  and  constructing  high-level  attack  scenarios 

Our  first  accomplishment  is  a  formal  framework  for  correlating  intrusion  alerts  and  constructing  high-level 
attack  scenarios  based  on  the  low-level  intrusion  alerts.  Our  techniques  address  an  important  problem  in  the 
management  of  IDSs.  Our  method  can  be  explained  easily  based  on  the  following  observation:  most  intrusions 
are  not  isolated,  but  related  as  different  stages  of  attacks,  with  the  early  stages  preparing  for  the  later  ones.  For 
example,  in  Distributed  Denial  of  Service  (DDOS)  attacks,  the  attacker  has  to  install  the  DDOS  daemon 
programs  in  vulnerable  hosts  before  he/she  can  instruct  the  daemons  to  launch  an  attack.  In  other  words,  an 
attacker  has  to  (or  usually  does)  reach  a  certain  state  before  he/she  can  carry  out  certain  attacks,  and  usually 
reaches  the  state  by  launching  some  other  attacks. 

Based  on  this  observation,  we  correlate  alerts  using  prerequisites  and  consequences  of  intrusions.  Intuitively, 
the  prerequisite  of  an  intrusion  is  the  necessary  condition  for  the  intrusion  to  be  successful,  while  the 
consequence  of  an  intrusion  is  the  outcome  of  the  intrusion  if  it  is  successful.  For  example,  the  existence  of  a 
vulnerable  service  is  the  prerequisite  of  a  remote  buffer  overflow  attack  against  the  service,  and  as  the 
consequence  of  the  attack,  the  attacker  may  gain  access  to  the  host.  Accordingly,  we  correlate  the  alerts 
together  when  the  attackers  launch  some  early  attacks  to  prepare  for  the  prerequisites  of  some  later  ones.  For 
example,  if  they  use  a  UDP  port  scan  to  discover  the  vulnerable  services,  followed  by  an  attack  against  one  of 
the  services,  we  can  correlate  the  corresponding  alerts  together.  In  addition,  our  formalism  provides  an  intuitive 
representation  of  correlated  alerts  and  a  specific  mechanism  for  alert  correlation,  which  leads  to  our 
implementation  of  the  method. 

The  details  of  the  above  result  can  be  found  in  papers  No.  4  and  No.  12  listed  in  section  3. 

2.2  Interactive  utilities  for  analyzing  intensive  intrusion  alerts 

Our  second  accomplishment  is  the  development  of  a  set  of  utilities  for  interactive  analysis  of  intensive 
intrusion  alerts.  Each  utility  takes  a  set  of  hyper-alerts  as  input.  Depending  on  the  output,  these  utilities  can  be 
divided  into  two  classes:  hyper-alert  generating  utilities  and  feature  extraction  utilities.  A  hyper-alert  generating 
utility  outputs  one  or  multiple  sets  of  hyper-alerts,  while  a  feature  extraction  utility  only  outputs  the  properties 
of  the  input  hyper-alerts.  We  have  developed  six  utilities,  including  alert  aggregation/disaggregation,  focused 
analysis,  clustering  analysis,  frequency  analysis,  link  analysis,  and  association  analysis.  The  first  three  utilities 
are  hyper-alert  generating  utilities,  while  the  last  three  are  feature  extraction  utilities. 

These  utilities  are  intended  for  human  users  to  analyze  and  understand  the  correlated  alerts  as  well  as  the 
strategies  behind  them.  We  studied  the  effectiveness  of  these  utilities  through  a  case  study  with  the  network 
traffic  captured  at  the  DEF  CON  8  Capture  the  Flag  (CTF)  event.  Our  results  showed  that  they  could  effectively 
simplify  the  analysis  of  large  amounts  of  alerts.  Our  analysis  also  revealed  several  attack  strategies  that 
appeared  in  the  DEF  CON  8  CTF  contest. 
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The  details  of  the  development  of  the  three  utilities  and  the  experimental  evaluation  of  them  can  be  found  in 
papers  No.  4  and  No.  13  in  section  3. 

2.3  Multi-level  key  chain  method  for  scalable  broadcast  authentication  in  distributed  sensor 
networks 

In  studying  the  applicability  of  our  techniques  to  wireless  networks,  we  developed  a  multi-level  key  chain 
method  based  on  uTESLA  to  enable  cost-effective  and  scalable  broadcast  authentication  in  low-end  sensor 
networks.  Broadcast  authentication  is  an  essential  service  in  distributed  sensor  networks.  Because  of  the  large 
numbers  of  sensor  nodes  and  the  broadcast  nature  of  the  communication  in  distributed  sensor  networks,  it  is 
usually  desirable  for  the  base  stations  to  broadcast  commands  and  data  to  the  sensor  nodes.  In  hostile 
environments  (e.g.,  battle  field,  anti-terrorists  operations),  it  is  necessary  to  enable  the  sensor  nodes  to 
authenticate  the  broadcast  messages  received  from  the  base  station 

The  only  practical  broadcast  authentication  protocol  in  low-end  sensor  networks  (before  our  result)  is 
uTESLA,  which  is  adapted  from  a  stream  authentication  protocol  called  TESLA.  However,  uTESLA  requires 
that  the  base  station  unicast  the  initial  parameters  to  the  sensor  nodes  individually.  This  feature  severely  limits 
the  application  of  uTESLA  in  large  sensor  networks.  To  address  this  problem,  we  developed  an  extension  to 
uTESLA.  The  basic  idea  is  to  predetermine  and  broadcast  the  initial  parameters  required  by  uTESLA  instead 
of  unicast-based  message  transmission.  In  the  simplest  form,  our  extension  distributes  the  uTESLA  parameters 
during  the  initialization  of  the  sensor  nodes  (e.g.,  along  with  the  master  key  shared  between  each  sensor  and  the 
base  station).  To  provide  more  flexibility,  especially  to  prolong  the  lifetime  of  uTESLA  without  requiring  a 
very  long  key  chain,  we  introduced  a  multi-level  key  chain  scheme,  in  which  the  higher-level  key  chains  are 
used  to  authenticate  the  commitments  of  lower-level  ones.  To  further  improve  the  survivability  of  the  scheme 
against  message  loss  and  Denial  of  Service  (DOS)  attacks,  we  used  redundant  message  transmission  and 
random  selection  strategies  to  deal  with  the  messages  that  distribute  key  chain  commitments.  The  resulting 
scheme  removes  the  requirement  of  unicast-based  initial  communication  between  base  station  and  sensor  nodes 
while  keeping  the  nice  properties  of  uTESLA  (e.g.,  tolerance  of  message  loss,  resistance  to  replay  attacks).  Our 
implementation  and  experiments  further  demonstrated  that  our  scheme  could  tolerate  high  channel  loss  rate  and 
is  resistant  to  certain  known  DOS  attacks  to  a  certain  degree. 

The  details  of  the  above  result  can  be  found  in  papers  No.  3  and  No.  1 1  listed  in  section  3. 

2.4  Adapting  query  optimization  techniques  for  efficient  intrusion  alert  correlation 

Intrusion  alert  correlation  is  the  process  to  identify  high-level  attack  scenarios  by  reasoning  about  low-level 
alerts  raised  by  IDSs.  The  efficiency  of  intrusion  alert  correlation  is  critical  in  enabling  interactive  analysis  of 
intrusion  alerts  as  well  as  prompt  responses  to  attacks.  In  this  part  of  work,  we  adapted  a  number  of  main 
memory  index  structures  (e.g.,  T  Trees,  Linear  Hashing)  and  database  query  optimization  techniques  (e.g., 
nested  loop  join,  sort  join)  to  speed  up  intrusion  alert  correlation.  By  taking  advantage  of  the  characteristics  of 
the  alert  correlation  process,  we  developed  three  techniques  named  hyper-alert  container,  two-level  index ,  and 
sort  correlation.  We  also  performed  a  series  of  experiments  designed  to  evaluate  the  effectiveness  of  these 
techniques.  These  experiments  demonstrate  that  (1)  hyper- alert  containers  improve  the  efficiency  of  order¬ 
preserving  index  structures  (e.g.,  T  Trees),  with  which  an  insertion  operation  involves  search,  (2)  two-level 
index  improves  the  efficiency  of  all  index  structures,  (3)  a  two-level  index  structure  combining  Chained  Bucket 
Hashing  and  Linear  Hashing  is  the  most  efficient  for  streamed  alerts  with  and  without  memory  constraint,  and 
(4)  sort  correlation  with  heap  sort  algorithm  is  the  most  efficient  for  alert  correlation  in  batch. 

More  details  can  be  found  in  paper  No.  9  listed  in  section  3. 
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2.5  Insider  attacks  against  mobile  ad-hoc  routing  protocols 


In  this  part  of  work,  we  performed  a  systematic  analysis  of  insider  attacks  against  mobile  ad-hoc  routing 
protocols,  using  the  Ad  hoc  On-Demand  Distance  Vector  (AODV)  protocol  as  an  example.  We  identify  a 
number  of  attack  goals  and  then  study  how  to  achieve  these  goals  through  misuses  of  the  routing  messages.  To 
facilitate  the  analysis,  we  classify  the  insider  attacks  into  two  categories:  atomic  misuses  and  compound 
misuses.  Atomic  misuses  are  performed  by  manipulating  a  single  routing  message,  which  cannot  be  further 
divided;  compound  misuses  are  composed  of  combinations  of  atomic  misuses  and  possibly  normal  uses  of  the 
routing  protocol.  Our  analysis  reveals  several  classes  of  insider  attacks,  including  route  disruption,  route 
invasion,  node  isolation,  and  resource  consumption.  To  validate  our  results,  we  have  implemented  and  studied 
these  attacks  through  simulation,  which  demonstrate  the  impact  of  these  attacks. 

Additional  details  can  be  found  in  papers  No.  1  and  No.  10  listed  in  section  3. 

2.6  Learning  attack  strategies  from  intrusion  alerts 

Understanding  the  strategies  of  attacks  is  crucial  for  security  applications  such  as  computer  and  network 
forensics,  intrusion  response,  and  prevention  of  future  attacks.  In  this  part  of  research,  we  developed  techniques 
to  automatically  learn  attack  strategies  from  intrusion  alerts.  Central  to  these  techniques  is  a  model  that 
represents  an  attack  strategy  as  a  graph  of  attacks  with  constraints  on  the  attack  attributes  and  the  temporal  order 
among  these  attacks.  To  learn  the  intrusion  strategy  is  then  to  extract  such  a  graph  from  a  sequence  of  intrusion 
alerts.  To  further  facilitate  the  analysis  of  attack  strategies,  which  is  essential  to  many  security  applications 
such  as  computer  and  network  forensics  and  incident  handling,  we  developed  techniques  to  measure  the 
similarity  between  attack  strategies.  The  basic  idea  is  to  reduce  the  similarity  measurement  of  attack  strategies 
into  error-tolerant  graph  isomorphism  problem,  and  measure  the  similarity  between  attack  strategies  in  terms  of 
the  cost  to  transform  one  strategy  into  another.  Finally,  we  have  performed  a  series  of  experiments,  which 
demonstrate  the  potential  of  the  aforementioned  techniques. 

Additional  details  about  this  work  can  be  found  in  paper  No.  8  listed  section  3. 

2.7  Hypothesizing  and  reasoning  about  attacks  missed  by  intrusion  detection  systems 

Several  alert  correlation  methods  were  proposed  in  the  past  several  years  to  construct  high-level  attack 
scenarios  from  low-level  intrusion  alerts  reported  by  IDSs.  These  correlation  methods  have  different  strengths 
and  limitations;  none  of  them  clearly  dominate  the  others.  However,  all  of  these  methods  depend  heavily  on  the 
underlying  IDSs,  and  perform  poorly  when  the  IDSs  miss  critical  attacks.  In  order  to  improve  the  performance 
of  intrusion  alert  correlation  and  reduce  the  impact  of  missed  attacks,  we  developed  a  series  of  techniques  to 
integrate  two  complementary  types  of  alert  correlation  methods:  (1)  those  based  on  the  similarity  between  alert 
attributes,  and  (2)  those  based  on  prerequisites  and  consequences  of  attacks.  In  particular,  we  developed 
techniques  to  hypothesize  and  reason  about  attacks  possibly  missed  by  IDSs  based  on  the  indirect  causal 
relationship  between  intrusion  alerts  and  the  constraints  they  must  satisfy.  We  also  investigated  additional 
techniques  to  validate  the  hypothesized  attacks  through  raw  audit  data  and  to  consolidate  the  hypothesized 
attacks  to  generate  concise  attack  scenarios.  Our  experimental  results  demonstrate  the  potential  of  these 
techniques  in  building  high-level  attack  scenarios  and  reasoning  about  possibly  missed  attacks. 

Additional  details  about  this  work  can  be  found  in  paper  No.  2  and  No.  7  listed  in  section  3. 


4 


2.8  Integrating  complementary  intrusion  evidence  from  multiple  information  sources 


We  developed  techniques  to  integrate  and  reason  about  complementary  intrusion  evidence  such  as  intrusion 
alerts  generated  by  IDSs  and  reports  by  system  monitoring  or  vulnerability  scanning  tools.  To  facilitate  the 
modeling  of  intrusion  evidence,  we  classify  intrusion  evidence  into  either  event-based  evidence  or  state-based 
evidence.  Event-based  evidence  refers  to  observations  (or  detections)  of  intrusive  actions  (e.g.,  IDS  alerts), 
while  state-based  evidence  refers  to  observations  of  the  effects  of  intrusions  on  system  states.  Based  on  the 
interdependency  between  event-based  and  state-based  evidence,  we  developed  techniques  to  automatically 
integrate  complementary  evidence  into  Bayesian  networks,  and  reason  about  uncertain  or  unknown  intrusion 
evidence  based  on  verified  evidence.  The  experimental  results  demonstrated  the  potential  of  the  proposed 
techniques.  In  particular,  additional  observations  by  system  monitoring  or  vulnerability  scanning  tools  can 
potentially  reduce  the  false  alert  rate  and  increase  the  confidence  in  alerts  corresponding  to  successful  attacks. 

Additional  details  about  this  work  can  be  found  in  paper  No.  5  listed  in  section  3. 

2.9  Alert  Correlation  through  Triggering  Events  and  Common  Resources 

Complementary  security  systems  are  widely  deployed  in  networks  to  better  protect  digital  assets.  Alert 
correlation  is  essential  to  understand  the  security  threats  and  take  appropriate  actions.  Here  we  propose  a  novel 
correlation  approach  based  on  triggering  events  and  common  resources.  One  of  the  key  concepts  in  our 
approach  is  triggering  events,  which  are  the  (low-level)  events  that  trigger  alerts.  By  grouping  alerts  that  share 
"similar"  triggering  events,  a  set  of  alerts  can  be  partitioned  into  different  clusters  where  each  cluster  may 
correspond  to  the  same  attack.  Our  approach  further  examines  whether  the  alerts  in  each  cluster  are  consistent 
with  relevant  network  and  host  configurations,  which  helps  analysts  partially  identify  the  severity  of  alerts  and 
clusters.  The  other  key  concept  in  our  approach  is  input  and  output  resources.  Intuitively,  input  resources  are 
the  necessary  resources  for  an  attack  to  succeed,  and  output  resources  are  the  resources  that  an  attack  supplies  if 
successful.  We  model  each  attack  through  specifying  input  and  output  resources.  By  identifying  the  "common" 
resources  between  output  resources  of  one  attack  and  input  resources  of  another,  our  approach  discovers  causal 
relationships  between  alert  clusters  and  builds  attack  scenarios.  The  preliminary  experimental  results 
demonstrate  the  usefulness  of  the  proposed  techniques. 

We  also  investigated  how  to  use  resources  to  facilitate  the  specification  of  hyper-alert  types  (or,  equivalently, 
attacks),  which  are  required  in  our  correlation  method.  We  organize  resources  into  trees,  where  the  nodes  in  the 
trees  are  labeled  with  conditions  (represented  by  predicates).  To  specify  the  prerequisite  and  consequence  of  an 
attack,  we  first  look  for  the  desirable  resource  trees  related  to  the  attack's  prerequisite  and  consequence,  then 
traverse  the  trees  to  find  the  appropriate  nodes,  and  finally  select  the  suitable  predicates  to  put  into  the 
prerequisite  and  consequence.  Our  approach  is  simple  and  less  expert- dependent.  The  usability  study  and 
comprehensiveness  study  (with  more  than  350  attack  types)  demonstrate  the  effectiveness  of  our  approach. 

Additional  details  about  this  work  can  be  found  in  papers  No.  6  and  No.  16  listed  in  section  3. 

2.10  Privacy-Preserving  Alert  Correlation  Using  Concept  Hierarchies 

With  the  increasing  security  threats  from  infrastructure  attacks  such  as  worms  and  distributed  denial  of 
service  attacks,  it  is  clear  that  the  cooperation  among  different  organizations  is  necessary  to  defend  against  these 
attacks.  However,  organizations'  privacy  concerns  for  the  incident  and  security  alert  data  require  that  sensitive 
data  be  sanitized  before  they  are  shared  with  other  organizations.  Such  sanitization  process  usually  has  negative 
impacts  on  intrusion  analysis  (such  as  alert  correlation).  To  balance  the  privacy  requirements  and  the  need  for 
intrusion  analysis,  we  propose  a  privacy-preserving  alert  correlation  approach  based  on  concept  hierarchies.  Our 
approach  consists  of  two  phases.  The  first  phase  is  entropy  guided  alert  sanitization ,  where  sensitive  alert 
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attributes  are  generalized  to  high-level  concepts  to  introduce  uncertainty  into  the  dataset  with  partial  semantics. 
To  balance  the  privacy  and  the  usability  of  alert  data,  we  propose  to  guide  the  alert  sanitization  process  with  the 
entropy  or  differential  entropy  of  sanitized  attributes.  The  second  phase  is  sanitized  alert  correlation.  We  focus 
on  defining  similarity  functions  between  sanitized  attributes  and  building  attack  scenarios  from  sanitized  alerts. 
Our  preliminary  experimental  results  demonstrate  the  effectiveness  of  the  proposed  techniques  in  terms  of 
various  measures  (e.g.,  correct  classification  rates,  false  alert  rates,  and  detection  rates). 

More  details  of  this  work  can  be  found  in  paper  No.  15  in  section  3. 
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